41 research outputs found
A Survey on Deep Generative 3D-aware Image Synthesis
Recent years have seen remarkable progress in deep learning powered visual content creation. This includes deep generative 3D-aware image synthesis, which produces high-fidelity images in a 3D-consistent manner while simultaneously capturing compact surfaces of objects from pure image collections without the need for any 3D supervision, thus bridging the gap between 2D imagery and 3D reality. The field of computer vision has been recently captivated by the task of deep generative 3D-aware image synthesis, with hundreds of papers appearing in top-tier journals and conferences over the past few years (mainly the past two years), but there lacks a comprehensive survey of this remarkable and swift progress. Our survey aims to introduce new researchers to this topic, provide a useful reference for related works, and stimulate future research directions through our discussion section. Apart from the presented papers, we aim to constantly update the latest relevant papers along with corresponding implementations at https://weihaox.github.io/3D-aware-Gen
A Survey on Deep Generative 3D-aware Image Synthesis
Recent years have seen remarkable progress in deep learning powered visual content creation. This includes deep generative 3D-aware image synthesis, which produces high-idelity images in a 3D-consistent manner while simultaneously capturing compact surfaces of objects from pure image collections without the need for any 3D supervision, thus bridging the gap between 2D imagery and 3D reality. The ield of computer vision has been recently captivated by the task of deep generative 3D-aware image synthesis, with hundreds of papers appearing in top-tier journals and conferences over the past few years (mainly the past two years), but there lacks a comprehensive survey of this remarkable and swift progress. Our survey aims to introduce new researchers to this topic, provide a useful reference for related works, and stimulate future research directions through our discussion section. Apart from the presented papers, we aim to constantly update the latest relevant papers along with corresponding implementations at this https URL [https://weihaox.github.io/3D-aware-Gen]
Cali-Sketch: Stroke Calibration and Completion for High-Quality Face Image Generation from Poorly-Drawn Sketches
Image generation task has received increasing attention because of its wide
application in security and entertainment. Sketch-based face generation brings
more fun and better quality of image generation due to supervised interaction.
However, When a sketch poorly aligned with the true face is given as input,
existing supervised image-to-image translation methods often cannot generate
acceptable photo-realistic face images. To address this problem, in this paper
we propose Cali-Sketch, a poorly-drawn-sketch to photo-realistic-image
generation method. Cali-Sketch explicitly models stroke calibration and image
generation using two constituent networks: a Stroke Calibration Network (SCN),
which calibrates strokes of facial features and enriches facial details while
preserving the original intent features; and an Image Synthesis Network (ISN),
which translates the calibrated and enriched sketches to photo-realistic face
images. In this way, we manage to decouple a difficult cross-domain translation
problem into two easier steps. Extensive experiments verify that the face
photos generated by Cali-Sketch are both photo-realistic and faithful to the
input sketches, compared with state-of-the-art methodsComment: 10 pages, 12 figure
TediGAN: Text-Guided Diverse Face Image Generation and Manipulation
In this work, we propose TediGAN, a novel framework for multi-modal image
generation and manipulation with textual descriptions. The proposed method
consists of three components: StyleGAN inversion module, visual-linguistic
similarity learning, and instance-level optimization. The inversion module maps
real images to the latent space of a well-trained StyleGAN. The
visual-linguistic similarity learns the text-image matching by mapping the
image and text into a common embedding space. The instance-level optimization
is for identity preservation in manipulation. Our model can produce diverse and
high-quality images with an unprecedented resolution at 1024. Using a control
mechanism based on style-mixing, our TediGAN inherently supports image
synthesis with multi-modal inputs, such as sketches or semantic labels, with or
without instance guidance. To facilitate text-guided multi-modal synthesis, we
propose the Multi-Modal CelebA-HQ, a large-scale dataset consisting of real
face images and corresponding semantic segmentation map, sketch, and textual
descriptions. Extensive experiments on the introduced dataset demonstrate the
superior performance of our proposed method. Code and data are available at
https://github.com/weihaox/TediGAN.Comment: CVPR 2021. Code: https://github.com/weihaox/TediGAN Data:
https://github.com/weihaox/Multi-Modal-CelebA-HQ Video:
https://youtu.be/L8Na2f5viA
Domain Fingerprints for No-reference Image Quality Assessment
Human fingerprints are detailed and nearly unique markers of human identity.
Such a unique and stable fingerprint is also left on each acquired image. It
can reveal how an image was degraded during the image acquisition procedure and
thus is closely related to the quality of an image. In this work, we propose a
new no-reference image quality assessment (NR-IQA) approach called domain-aware
IQA (DA-IQA), which for the first time introduces the concept of domain
fingerprint to the NR-IQA field. The domain fingerprint of an image is learned
from image collections of different degradations and then used as the unique
characteristics to identify the degradation sources and assess the quality of
the image. To this end, we design a new domain-aware architecture, which
enables simultaneous determination of both the distortion sources and the
quality of an image. With the distortion in an image better characterized, the
image quality can be more accurately assessed, as verified by extensive
experiments, which show that the proposed DA-IQA performs better than almost
all the compared state-of-the-art NR-IQA methods.Comment: accepted by IEEE Transactions on Circuits and Systems for Video
Technology (TCSVT
Unsupervised Multi-Domain Multimodal Image-to-Image Translation with Explicit Domain-Constrained Disentanglement
Image-to-image translation has drawn great attention during the past few
years. It aims to translate an image in one domain to a given reference image
in another domain. Due to its effectiveness and efficiency, many applications
can be formulated as image-to-image translation problems. However, three main
challenges remain in image-to-image translation: 1) the lack of large amounts
of aligned training pairs for different tasks; 2) the ambiguity of multiple
possible outputs from a single input image; and 3) the lack of simultaneous
training of multiple datasets from different domains within a single network.
We also found in experiments that the implicit disentanglement of content and
style could lead to unexpect results. In this paper, we propose a unified
framework for learning to generate diverse outputs using unpaired training data
and allow simultaneous training of multiple datasets from different domains via
a single network. Furthermore, we also investigate how to better extract domain
supervision information so as to learn better disentangled representations and
achieve better image translation. Experiments show that the proposed method
outperforms or is comparable with the state-of-the-art methods.Comment: 20 page
GAN Inversion: A Survey
GAN inversion aims to invert a given image back into the latent space of a
pretrained GAN model, for the image to be faithfully reconstructed from the
inverted code by the generator. As an emerging technique to bridge the real and
fake image domains, GAN inversion plays an essential role in enabling the
pretrained GAN models such as StyleGAN and BigGAN to be used for real image
editing applications. Meanwhile, GAN inversion also provides insights on the
interpretation of GAN's latent space and how the realistic images can be
generated. In this paper, we provide an overview of GAN inversion with a focus
on its recent algorithms and applications. We cover important techniques of GAN
inversion and their applications to image restoration and image manipulation.
We further elaborate on some trends and challenges for future directions
Adaptive Rotated Convolution for Rotated Object Detection
Rotated object detection aims to identify and locate objects in images with
arbitrary orientation. In this scenario, the oriented directions of objects
vary considerably across different images, while multiple orientations of
objects exist within an image. This intrinsic characteristic makes it
challenging for standard backbone networks to extract high-quality features of
these arbitrarily orientated objects. In this paper, we present Adaptive
Rotated Convolution (ARC) module to handle the aforementioned challenges. In
our ARC module, the convolution kernels rotate adaptively to extract object
features with varying orientations in different images, and an efficient
conditional computation mechanism is introduced to accommodate the large
orientation variations of objects within an image. The two designs work
seamlessly in rotated object detection problem. Moreover, ARC can conveniently
serve as a plug-and-play module in various vision backbones to boost their
representation ability to detect oriented objects accurately. Experiments on
commonly used benchmarks (DOTA and HRSC2016) demonstrate that equipped with our
proposed ARC module in the backbone network, the performance of multiple
popular oriented object detectors is significantly improved (e.g. +3.03% mAP on
Rotated RetinaNet and +4.16% on CFA). Combined with the highly competitive
method Oriented R-CNN, the proposed approach achieves state-of-the-art
performance on the DOTA dataset with 81.77% mAP